Regret Bounds for the Adaptive Control of Linear Quadratic Systems

نویسندگان

  • Yasin Abbasi-Yadkori
  • Csaba Szepesvári
چکیده

We study the average cost Linear Quadratic (LQ) control problem with unknown model parameters, also known as the adaptive control problem in the control community. We design an algorithm and prove that apart from logarithmic factors its regret up to time T is O( √ T ). Unlike previous approaches that use a forced-exploration scheme, we construct a high-probability confidence set around the model parameters and design an algorithm that plays optimistically with respect to this confidence set. The construction of the confidence set is based on the recent results from online least-squares estimation and leads to improved worst-case regret bound for the proposed algorithm. To the best of our knowledge this is the the first time that a regret bound is derived for the LQ control problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite Time Analysis of Optimal Adaptive Policies for Linear-Quadratic Systems

We consider the classical problem of control of linear systems with quadratic cost. When the true system dynamics are unknown, an adaptive policy is required for learning the model parameters and planning a control policy simultaneously. Addressing this trade-off between accurate estimation and good control represents the main challenge in the area of adaptive control. Another important issue i...

متن کامل

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

We study the problem of adaptive control of a high dimensional linear quadratic (LQ) system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average cost LQ problem, a regret bound of O( √ T ) was shown, apart form logarithmic factors. However, this bound scales exponentially with p, the dimension of the ...

متن کامل

Regret bounds for Non Convex Quadratic Losses Online Learning over Reproducing Kernel Hilbert Spaces

We present several online algorithms with dimension-free regret bounds for general nonconvex quadratic losses by viewing them as functions in Reproducing Hilbert Kernel Spaces. In our work we adapt the Online Gradient Descent, Follow the Regularized Leader and the Conditional Gradient method meta algorithms for RKHS spaces and provide regret bounds in this setting. By analyzing them as algorith...

متن کامل

Higher-Order Regret Bounds with Switching Costs

This work examines online linear optimization with full information and switching costs (SCs) and focuses on regret bounds that depend on properties of the loss sequences. The SCs considered are bounded functions of a pair of decisions, and regret is augmented with the total SC. We show under general conditions that for any normed SC, σ(x,x) = ‖x−x′‖, regret cannot be bounded given only a bound...

متن کامل

Tracking Adversarial Targets

We study linear control problems with quadratic losses and adversarially chosen tracking targets. We present an efficient algorithm for this problem and show that, under standard conditions on the linear system, its regret with respect to an optimal linear policy grows as O(log T ), where T is the number of rounds of the game. We also study a problem with adversarially chosen transition dynamic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011